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\  1.0  SUMMARY 

-^Thls  document  constitutes  the  final  report^under  Contract  MDA903-80- 
C-0498,  DARPA  Order  Ho.  3984, con  research  In  efficient  numerical  methods  for 
two-dimensional  VLSI  process  modeling.  The  major  output  of  this  work  Is 
embodied  In  the  computer  code  MEMBRE  (forjlultl  dimensional  Efficient  Moving 
Boundary  Redistribution)  and  the  associated  user's  manualvwhlch  was  published 

ay  _ —  - -  '  - - — J  / 

v  as  the  Appendix  to  semi-annual  technical  report  No.  3  under  this  contract. 

^Thls  code  predicts  the  two-dimensional  dopant  profiles  which  evolve  during 
oxidation  and  drive-in.  Including  those  effects  arising  from  formation  of  the 
bird's  beak  oxide  profile  and  from  concentration  dependence  of  the  dlffuslvlty 
at  high  doping  levels.  Typical  CPU^  times  for  aPcomplete  redistribution  pro¬ 
cess,  Involving  several  thermal  cycles,  are  found  to  range  from  two  to  ten 
minutes  on  the  IBM  3033. 

In  the  final  phase  of  the  contract,  modifications  have  been  Intro¬ 
duced  In  the  basic  code  to  Increase  the  speed  of  solution  for  the  more  diffi¬ 
cult  cases  (e.g.,  arsenic  drive-in).  A  factor  of  two  reduction  In  CPU  time 
was  found  to  be  achievable  without  deterioration  In  solution  accuracy  through 
the  use  of  nonuniform  spatial  grids. 

MEMBRE  has  been  made  available  on  tape  to  the  U.S.  Integrated  cir¬ 
cuits  community  In  the  form  of  a  FORTRAN  program  executable  on  the  IBM  3033. 

It  contains  the  basic  Input/output  routines  from  Stanford's  one-dimensional 
process  code  SUPREM,  modified  to  deal  with  two-dimensional  process  data.  Its 
capabilities  have  been  described  In  presentations  at  two  technical  meetings1 »2 
and  In  an  article  to  appear  In  the  new  journal  COMPEL. 3  Eight  requests  for 
the  code  from  Integrated  circuit  manufacturers  and  researchers  In  the  U.S. 
have  been  processed  to  date.  ^ 

1.1  Task  Objectives 


The  overall  objective  of  this  program  has  been  to  develop  fast  and 
accurate  methods  for  computer  modeling  of  the  two-dimensional  spread  of 
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dopants  and  other  defects  during  VLSI  circuit  fabrication.  Our  goals  for  the 
first  year  Mere  to  demonstrate  a  fast  algorithm  for  calculating  nonlinear  dif¬ 
fusion  of  a  single  dopant  during  nonuniform  oxide  growth,  and  to  provide  this 
algorithm  In  a  form  suitable  for  Incorporation  Into  a  general  process  simu¬ 
lator  such  as  Stanford's  SUPREM.  These  goals  have  been  accomplished. 

The  specific  objectives  for  the  second  year  Included: 

1.  Effective  transfer  of  the  basic  algorithm  to  the  Integrated 
circuits  community; 

2.  Extension  of  the  code  to  treat  multiple  Interacting  species  and 
three-dimensional  redistribution;  and 

3.  Exploration  of  the  computational  requirements  posed  by  better 
physical  models  for  the  underlying  processes  of  chemical  reac¬ 
tion  and  defect  generation  and  migration. 

1.2  Technical  Problem 

The  fabrication  of  VLSI  devices  requires  production  of  features  of 
submicron  size  and  separation.  Electrical  characteristics  such  as  threshold 
and  punchthrough  voltages  will  be  sensitive  to  dopant  spread  Into  critical 
areas  adjacent  to  the  original  features.  Experimental  control  of  this  spread¬ 
ing,  without  guidance  from  accurate  computer  modeling,  will  be  costly, 
tedious,  and  time-consuming.  However,  the  use  of  standard  numerical  methods 
to  achieve  an  adequate  modeling  capability  Is  also  costly  and  time- 
consuming.  One  should  therefore  seek  advanced  methods,  drawn  from  areas  such 
as  fluid  dynamics,  where  considerable  effort  and  Ingenuity  have  been  expended 
In  recent  years  to  develop  fast  and  accurate  solvers  for  the  characterization 
of  multidimensional,  time-dependent  phenomena. 
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Based  on  our  own  ongoing  research  In  computational  nonlinear  aero¬ 
dynamics,  we  Identified  several  promising  approaches  to  the  development  of  a 
fast  solver  for  two-dimensional  diffusion  problems.  After  a  preliminary 
screening,  a  few  of  these  were  selected  for  adaptation  to  the  problem  of 
dopant  spread  during  oxidation  or  annealing.  These  algorithms  were  tested  for 
speed  and  accuracy  on  the  problem  of  nonlinear  dopant  diffusion  Into  the 
channel  region  of  a  MOSFET,  as  well  as  on  simpler  problems  for  which  the 
actual  dopant  profiles  could  be  accurately  obtained  by  other  means. 


The  algorithm  finally  selected  for  further  development  provided  not 
only  exceptional  speed,  but  also  a  natural  extendablllty  to  Interacting  defect 
species  and  three-dimensional  diffusion. 


1.4  Technical  Results 

The  specific  Improvements  In  MEMBRE  accomplished  during  the  final 
phase  of  this  contract  are  described  In  Section  2  of  this  report.  They 
Include  a  technique  for  clustering  grid  points  In  regions  of  large  spatial 
gradient  (nonuniform  grlddlng)  and  optimization  of  the  code  for  execution  on 
the  Cray-1  computer,  which  has  become  generally  available  for  engineering  cal¬ 
culations  during  the  past  year. 


Important  Findings  and  Concl uslons 


We  restate  here  the  conclusions  presented  In  semiannual  technical 
report  No.  3,  as  they  provide  an  appropriate  summary  for  the  total  contract 
effort.  The  speed  with  which  MEMBRE  can  predict  the  effect  of  process  condi¬ 
tions  on  2-D  dopant  spread  should  make  the  code  a  useful  tool  In  the  Interac¬ 
tive  design  of  VLSI  fabrication  processes.  Many  of  the  most  common  features 
In  MOSFET  fabrication  fall  within  the  modeling  capabilities  of  the  present 
code.  However,  It  should  be  remembered  that  this  code  Is  Intended  only  as  a 
demonstration  of  2-0  modeling  capabilities.  It  Is  not  a  complete  process  s1m< 


ulator,  nor  has  It  been  optimized  for  Its  present  use.  Rather,  It  Is  pre- 
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sented  In  a  format  designed  to  encourage  adaptation  and  extension.  The  basic 
algorithm,  whose  software  Implementation  Is  Included  In  MEMBRE,  Is  capable  of 
solving  problems  of  much  greater  complexity. 

1.6  Implications  for  Further  Research 

The  development  of  efficient  numerical  methods  for  the  various  phases 
of  VLSI  design  Is  an  Important  goal  that  Is  partially  realized  In  MEMBRE. 
Achieving  comparable  speed  In  the  computation  of  device  electrical  character¬ 
istics  and  circuit  transient  behavior  appears  more  difficult,  based  on  the 
available  numerical  algorithms.  Researchers  active  In  this  area  may  benefit 
from  an  Investigation  of  the  applicability  of  the  Ideas  underlying  the  fast 
solver  In  MEMBRE  to  their  problems. 


O  .*  • 
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2.0  PROGRESS  ON  TWO-DIMENSIONAL  PROCESS  MODELING 

Two  major  developments  were  accomplished  during  the  last  reporting 
period:  (1)  The  computer  code  MEMBRE  was  modified  to  allow  for  a  variable 
spatial  grid,  (2)  the  program  was  vectorized  and  converted  for  execution  on 
the  Cray-1  computer.  The  variable  grid  feature  Is  very  Important  for  the 
redistribution  of  arsenic  Implants,  which  usually  have  large  gradients  over 
short  distances. 

The  computational  rectangle  is  now  covered  by  a  nonuniform,  arbitrary 
grid  with  coordinates  (c^,nj).  Spatial  derivatives  are  discretized  using  the 
following  approximations: 

3Wu  -  wi+ur  Vi.j 
35  *1+1  "  *1-1 


3n 


Vl  "  nj-l 


If  »(»)!?],.  ■ 


HJ1j  -  C 


1+1, j 


1+1  *1-1  *1+1 


c1  U1+l/2, j 


!ll  '  "l-l.J  0  ] 

C,  -  C,.i  “1-1/2, 


N1i  I  N1-J-l  n  ,  1 

"j  -  Vi 


h  »<"> 


r°i..i*i(llui..]*i  ~  "i-i.in1  '  Di.j-i  (Nm, 
'Vi '  "j-i*  {i-i* 


W-1>. 


C/4836A/cb 


Rockwell  International 

Science  Center 
SC5271.12FR 


where  Njj  *  N(£j,nj,T),  &l+l/2,j  *  0C(N1+l,j  ♦  Hy)/2],  Dy  =  D(Njj),  etc.  As 
before,  D(N)  represents  a  nonlinear  concentration-dependent  dlffuslvlty.  In 
order  to  Incorporate  the  above  difference  equations  Into  the  original  uniform 
code,  modifications  were  required  In  the  following  subroutines:  ASET,  DIFFUN, 
INIT,  LDNXI1,  LOADDX,  LOADDY,  LOADJX,  LOADJY,  LOADNX,  LODNYX,  MAIN,  OUTPT,  and 
UFCT.  Much  of  the  symmetry  present  In  the  uniform  grid  code  Is  no  longer 
available;  however,  some  computational  savings  may  still  be  made.  For 
example.  If  the  FORTRAN  variables  OSAVE(I)  and  XISEP(I)  represent 


N1 

OSAVE(I)  =  — 


1-1/2, j 


XI SEP ( I )  *  2/(5i+1  - 


for  any  fixed  j,  then 


I?  o>("> 


XISEP(I)  *  (DSAVE( 1+1)  -  DSAVE ( I ) ) 


Most  of  the  algorithms  In  MEMBRE  vectorized  automatically  when  using 
the  Cray-1  compiler.  However,  two  subroutines,  LOADDX  and  LOADDY,  make 
millions  of  calls  to  the  external  dlffuslvlty  function  D(N).  If  this  function 
Is  Included  as  a  statement  function  In  LOADDX  and  LOADDY,  a  factor  of  two 
Improvement  on  execution  can  be  obtained.  Below  are  the  new  listings  of  these 
subroutines  on  the  Cray-1  for  the  nonuniform  code. 

Presently,  the  variable  grid  Is  loaded  using  FORTRAN  statements  In 
the  MAIN  program.  Eventually,  the  SUPREM  Input  subroutines  will  be  modified 
to  allow  for  a  variable  grid  Input  from  a  new  GRID  card. 
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SUBROUTINE  LOADDX  (J.Y.DSAVE) 

THIS  SUBROUTINE  COMPUTES  THE  DISCRETIZED  DIFFERENCE  EQUATION  IN 
THE  X-OIRECTION  USING  THE  METHOO  OF  LINES 
IMPLICIT  REAL*8(A-H,0-Z) 

COMMON/P ARM1 /NX, NXP1 ,NY,NYP1,NXM1 , JB1 ,JB2, JB1P1 ,UB2M1 
COMMON/P ARM2/RD  X2 ,RDY2 .RHDX2 .RHDY2 ,R2DX,R4DXDY ,R2DY 
COMMON/P ARM3/BETA.DBETA1 .DBETA2 .DBETA3 ,NI .HALFNI 
C0MM0N/SRDX2/SRDX2 ( 1 ) 

DIMENSION  DSAVE(1),Y(1) 

D(A,B)  =  DBETA1*(1.0+BETA*(A+B))*(1.0+A/B) 

ICOL  *  (J-1)*NX 
DO  10  I  «  2, NX 
I SUB  »  ICOL+1 
ISUBM1  *  I SUB-1 

YH  *  0.500*{Y(ISUB)+Y(ISUBM1)) 

ALPHA  *  HALFNI*YH 

ALPHA  =  CVMGT(0.0, ALPHA, ALPHA.LT. 0.0) 

TERM  =•  SQRT  ( ALPH  A*ALPHA+1 . 0 ) 

DSAVE(I)  *  SR0X2(I)*D(ALPHA,TERM)*(Y(ISUB)-(ISU8M1)) 

10  CONTINUE 

DSAVE(l)  »  -DSAVE(2) 

DSAVE(NXPl)  »  -DSAVE(NX) 

RETURN 

END 
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SUBROUTINE  LOADDY(I,Y,DSAVE) 

C  THIS  SUBROUTINE  COMPUTES  THE  DISCRETIZED  DIFFERENCE  EQUATION  IN 
C  THE  Y-OIRECTION  USING  THE  METHOD  OF  LINES 
IMPLICIT  REAL*(A-H,0-Z) 

COMMON /PARM1 /NX, NXP1, NY, N YP1 ,NXM1 , JB1 , JB2, JB1P1 ,JB2M1 
C0MM0N/PARM2/RDX2,RDY2,RHDX2,RHDY2,R2DX,R4DXDY,R2DY 
C0MM0N/PARM3/BETA,DBETA1,DBETA2,DBETA3,NI .HALFNI 
COMMON/SRDY2/SRDY2 ( 1 ) 

DIMENSION  DSAVE(l) ,Y(1) 

D(A,B)  -  DBETAl*(1.0fBETA*(A+B))*(1.0fA/B) 

IC1  =  I-NX 
IC2  *  IC1-NX 
DO  10  0  *  2, NY 

YH  -  5D0*( Y (NX*J+IC1 )+Y(NX*J+IC2) ) 

ALPHA  »  HALFNI*YH 

ALPHA  *  CVMGT{  0.0,  ALPHA, ALPHA. LT. 0.0) 

TERM  =  SQRT (ALPHA*ALPHA+1 . 0) 

DSAVE(J)  »  SRDY2(0)*D(ALPHA,TERM)*(Y(NX*J+IC1)  -  Y(NX*J+IC2)) 
10  CONTINUE 

DSAVE(l)  -  -DSAVE(2) 

DSAVE(NYPl)  *  -DSAVE(NY) 

RETURN 

END 
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Two  cases  were  studied  to  determine  the  effect  of  the  variable  grid 
on  accuracy  and  execution  time  on  the  Cray-1. 

Case  1.  Redistribution  of  Boron  Field  Implant 

This  case  consists  of  the  development  of  a  bird's  beak.  The  silicon 
slab  Is  2  ym  x  3  ym.  In  the  uniform  case  =  An  =0.05  ym  which  produces  a 
spatial  grid  of  41  x  61  points.  A  150  KeV  boron  Implant  of  dose  2.5  x  10^  Cm“^ 
Is  simulated.  The  first  cycle  consists  of  oxidation  at  a  temperature  of 
1000°C  for  20  minutes  In  nitrogen.  This  Is  followed  by  a  nonunlformly  moving 
boundary  cycle  for  160  minutes  In  steam  at  1000°C.  The  actual  SUPREM  like 
Input  data  Is  listed  below  for  the  uniform  grid  case. 


1 

TITL 

BORON  FIELD  IMPLANT,  E-NMOS 

2 

GRID  DYSI=0. 06,DPTH=0. 05,YMAX=2,DELY=0. 05, YLMX=3 

3 

SUBS 

0RNT=1 00,ELEM=+,C0NC=5E14 

4 

COMM 

STARTING  OXIDE  THICKNESS  OF  0.005  UM 

5 

STEP 

TYPE=DEP0,TIME=1 ,GRTE=0. 005 

6 

PLOT 

T0TL=Y 

7 

PRINT  T0TL=Y,HEAD=Y 

8 

COMM 

150  KEV  BORON  IMPLANT 

9 

STEP 

TYPE* IMPL,ELEM=B,D0SE*2.5E12,AKEV=150,YDEV= 0.148, 

10 

+ 

YWIN-1. 

11 

PLOT  T0TL*N 

12 

COMM 

FIELD  OXIDATION 

13 

STEP 

TYPE*OXID,TEMP=1OOO,TIME*2O,MODL=NIT0 

14 

STEP 

TYPE=0XID, TEMP=1 000,  TIME=16O,MODL=WET0,RATO=O. 3333, 

15 

+ 

YPEN-0.075 

16 

END 

The  27  x  44 

nonuniform  grid  for  this  problem  Is  as  follows: 
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0.0000 

0.0500 

0.1000 

0.1500 

0.2000 

0.2500 

0.3000 

0.3500 

0.4000 

0.4500 

0.5000 

0.5500 

0.6000 

0.6500 

0.7000 

0.7500 

0.8000 

0.8500 

0.9000 

0.9500 

1.0000 

1.1000 

1.2000 

1.4000 

1.6000 

1.8000 

2.0000 

ETA  GRID 

0.0000 

0.0500 

0.1000 

0.1500 

0.2000 

0.2500 

0.3000 

0.3500 

0.4000 

0.4500 

0.5000 

0.5500 

0.6000 

0.6500 

0.7000 

0.7500 

0.8000 

0.8500 

0.9000 

0.9500 

1.0000 

1.0500 

1.1000 

1.1500 

1.2000 

1.2500 

1.3000 

1.3500 

1.4000 

1.4500 

1.5000 

1.5500 

1.6000 

1.6500 

1.7000 

1.7500 

1.8000 

1.9000 

2.0000 

2.2000 

2.4000 

2.6000 

2.8000 

3.0000 

Although  accuracy  Is  essentially  the  same  for  both  grids  In  all  areas  of 
critical  Interest,  CPU  time  Is  much  better  for  the  variable  grid  -  10.365 
seconds  versus  18.902  seconds.  Using  a  cruder  nonuniform  grid  would  yield  a 
further  Improvement  In  speed,  but  accuracy  would  begin  to  deteriorate. 

Case  2.  Redistribution  of  Arsenic  Source/Drain  Implant 

This  case  consists  of  the  oxidation  for  40  minutes  of  a  very  steep 
40  KeV  arsenic  profile  having  a  projected  range  (Rp)  of  0.0265  and  projected 
and  lateral  standard  deviations  of  0.0099  and  0.0103,  respectively.  An 
explosive  type  diffusion  Is  characteristic  of  such  problems,  and  a  very  fine 
uniform  grid  of  51  x  61  on  a  silicon  slab  of  1.25  urn  x  1.5  urn  was  required  to 
maintain  accuracy.  The  Initial  data  Is  listed  below: 
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1  TITL  ARSENIC  SOURCE/DRAIN  IMPLANT,  E-NMOS 

2  GRID  DYSI  =  0.025,  DPTH  =  0.025,  YMAX  =  1.25,  DELY  *  0.025, 

+  YLMX  =  1.5 

3  SUBS  ORNT  -  100,  ELEM  =+,  CONC  »  5E14 

4  PLOT  TOTL  =  Y,  CMIN  *  14,  NDEC  =  4,  WIND  *  2 

5  PRINT  TOTL  =  Y,  HEAD  *  Y 

6  COMM  40  KEV  ARSENIC  IMPLANT 

7  STEP  TYPE  =  IMPL,  ELEM  =  AS,  DOSE  =  1E16,  RANG  *  0.0265,  STDV 

8  +  0.0099,  YWIN  *  0.5,  YDEV  -  0.0103 

9  COMM  ARGON  ANNEAL 

10  STEP  TYPE  =  OXID,  TEMP  =  1000,  TIME  *  40,  MODL  *  NIT0 

11  END 

The  following  33  x  47  nonunifortn  grid  produced  the  same  quality  of  accuracy: 
XI  GRID 


0.0000 

0.0250  0.0500 

0.0750 

0.1000 

0.1250 

0.1500 

0.1750 

0.2000 

0.2250  0.2500 

0.2750 

0.3000 

0.3250 

0.3500 

0.3750 

0.4000 

0.4250  0.4500 

0.4750 

0.5000 

0.5250 

0.5500 

0.5750 

0.6000 

0.6500  0.7000 

0.7500 

0.8500 

0.9500 

1.0500 

1.1500 

1.2500 

ETA  GRID 

0.0000 

0.0250  0.0500 

0.0750 

0.1000 

0.1250 

0.1500 

0.1750 

0.2000 

0.2250  0.2500 

0.2750 

0.3000 

0.3250 

0.3500 

0.3750 

0.4000 

0.4250  0.4500 

0.4750 

0.5000 

0.5250 

0.5500 

0.5750 

0.6000 

0.6250  0.6500 

0.6750 

0.7000 

0.7250 

0.7500 

0.7750 

0.8000 

0.8250  0.8500 

0.8750 

0.9000 

0.9250 

0.9500 

0.9750 

1.000 

1.0500  1.1000 

1.2000 

1.3000 

1.4000 

1.5000 
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Computation  times  were  20.091  seconds  on  the  33  x  47  grid  and  36.008  seconds 


on  the  51  x  61  uniform  grid.  These  results  Indicate  that  a  factor  of  two 
Improvement  In  computation  time  can  be  achieved  when  using  a  nonuniform 
grid.  In  addition,  the  Cray-1  Is  about  13  times  faster  than  the  IBM  3033. 
Thus,  a  26  fold  Improvement  In  computer  execution  time  was  achieved  over  our 
previously  reported  data. 
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